Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides
نویسنده
چکیده
We consider the solution of very large sparse systems of linear equations on parallel architectures. In this context, memory is often a bottleneck that prevents or limits the use of direct solvers, especially those based on the multifrontal method. This work focuses on memory and performance issues of the two memory and computationally intensive phases of direct methods, namely, the numerical factorization and the solution phase. In the first part we consider the solution phase with sparse right-hand sides, and in the second part we consider the memory scalability of the multifrontal factorization. In the first part, we focus on the triangular solution phase with multiple sparse righthand sides, that appear in numerous applications. We especially emphasize the computation of entries of the inverse, where both the right-hand sides and the solution are sparse. We first present several storage schemes that enable a significant compression of the solution space, both in a sequential and a parallel context. We then show that the way the right-hand sides are partitioned into blocks strongly influences the performance and we consider two different settings: the out-of-core case, where the aim is to reduce the number of accesses to the factors, that are stored on disk, and the in-core case, where the aim is to reduce the computational cost. Finally, we show how to enhance the parallel efficiency. In the second part, we consider the parallel multifrontal factorization. We show that controlling the active memory specific to the multifrontal method is critical, and that commonly used mapping techniques usually fail to do so: they cannot achieve a high memory scalability, i.e., they dramatically increase the amount of memory needed by the factorization when the number of processors increases. We propose a class of “memoryaware” mapping and scheduling algorithms that aim at maximizing performance while enforcing a user-given memory constraint and provide robust memory estimates before the factorization. These techniques have raised performance issues in the parallel dense kernels used at each step of the factorization, and we have proposed some algorithmic improvements. The ideas presented throughout this study have been implemented within the MUMPS (MUltifrontal Massively Parallel Solver) solver and experimented on large matrices (up to a few tens of millions unknowns) and massively parallel architectures (up to a few thousand cores). They have demonstrated to improve the performance and the robustness of the code, and will be available in a future release. Some of the ideas presented in the first part have also been implemented within the PDSLin (Parallel Domain decomposition Schur complement based Linear solver) package.
منابع مشابه
Parallel Multifrontal Solution of Sparse Linear Least Squares Problems on Distributed-Memory Multiprocessors
We describe the issues involved in the design and implementation of eecient parallel algorithms for solving sparse linear least squares problems on distributed-memory multiprocessors. We consider both the QR factorization method due to Golub and the method of corrected semi-normal equations due to Bjj orck. The major tasks involved are sparse QR factorization, sparse triangular solution and spa...
متن کاملEfficient Parallel Solutions of Large Sparse Spd Systems on Distributed-memory Multiprocessors
We consider several issues involved in the solution of sparse symmetric positive deenite systems by multifrontal method on distributed-memory multiprocessors. First, we present a new algorithm for computing the partial factorization of a frontal matrix on a subset of processors which signiicantly improves the performance of a distributed multifrontal algorithm previously designed. Second, new p...
متن کاملOn Evaluating Parallel Sparse Cholesky Factorizations
Though many parallel implementations of sparse Cholesky factorization with the experimental results accompanied have been proposed, it seems hard to evaluate the performance of these factorization methods theoretically because of the irregular structure of sparse matrices. This paper is an attempt to such research. On the basis of the criteria of parallel computation and communication time, we ...
متن کاملRobust Memory-Aware Mappings for Parallel Multifrontal Factorizations
We focus on memory scalability issues in multifrontal solvers like MUMPS. We illustrate why commonly used mapping strategies (e.g., a proportional mapping) cannot achieve a high memory efficiency. We propose a class of “memory-aware” algorithms that aim at maximizing performance under memory constraints. These algorithms provide both accurate memory predictions and a robust solver. We illustrat...
متن کاملModeling 1D Distributed-Memory Dense Kernels for an Asynchronous Multifrontal Sparse Solver
To solve sparse linear systems multifrontal methods rely on dense partial LU decompositions of so-called frontal matrices; we consider a parallel, asynchronous setting in which several frontal matrices can be factored simultaneously. In this context, to address performance and scalability issues of acyclic pipelined asynchronous factorization kernels, we study models to revisit properties of le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012